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(54) Java hardware accelerator using microcode engine 



(57) A hardware Java accelerator is comprised of a 
decode stage and a microcode stage. Separating into 
the decode and microcode stage allows the decode 
stage to implement instruction level parallelism while the 
microcode stage allows the conversion of a single Java 
bytecode into multiple native instructions. A reissue 
buffer is provided which stores the converted instruc- 
tions and reissues them when the system returns from 



an interrupt. In this manner, the hardware accelerator 
need not be flushed upon an interrupt. A native PC mon- 
itor is also used. While the native PC is within a specific 
range, the hardware accelerator is enabled to convert 
the Java bytecodes into native instructions. When the 
native PC is outside the range, the hardware accelerator 
is disabled and the CPU operates on native instructions 
obtained from the memory. 
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Description 

Background of the Invention 

[0001] Java™ is an object-orientated programming 
language developed by Sun Microsystems. The Java 
language is small, simple and portable across platforms 
and operating systems, both at the source and at the 
binary level. This makes the Java programming lan- 
guage very popular on the Internet. 
[0002] Java's platform independence and code com- 
paction are the most significant advantages of Java over 
conventional programming languages. In conventional 
programming languages, the source code of a program 
is sent to a compiler which translates the program into 
machine code or processor instructions. The processor 
instructions are native to the system's processor. If the 
code is compiled on an Intel-based system, the resulting 
program will only run on other Intel-based systems. If it 
is desired to run the program on another system, the 
user must go back to the original source code, obtain a 
compiler for the new processor, and recompile the pro- 
gram into the machine code specific to that other proc- 
essor. 

[0003] Java operates differently. The Java compiler 
takes a Java program and, instead of generating ma- 
chine code for a particular processor, generates byte- 
codes. Bytecodes are instructions that look like machine 
code, but aren't specific to any processor. To execute a 
Java program, a bytecode interpreter takes the Java 
bytecode converts them to equivalent native processor 
instructions and executes the Java program. The Java 
bytecode interpreter is one component of the Java Vir- 
tual Machine. 

[0004] Having the Java programs in bytecode form 
means that instead of being specific to any one system, 
the programs can run on any platform and any operating 
system as long a Java Virtual Machine is available. This 
allows a binary bytecode file to be executable across 
platforms. 

[0005] The disadvantage of using bytecodes is exe- 
cution speed. System specific programs that run directly 
on the hardware from which they are compiled, run sig- 
nificantly faster that Java bytecodes, which must be 
processed by the Java Virtual Machine. The processor 
must both convert the Java bytecodes into native in- 
structions in the Java Virtual Machine and execute the 
native instructions. 

[0006] One way to speed up the Java Virtual Machine 
is by techniques such as the "Just in Time" (JIT) inter- 
preter, and even faster interpreters known as "Hot Spot 
JITs" interpreters. The JIT versions all result in a JIT 
compile overhead to generate native processor instruc- 
tions. These JIT interpreters also result in additional 
memory overhead. 

[0007] The slow execution speed of Java and over- 
head of JIT interpreters have made it difficult for con- 
sumer appliances requiring local-cost solutions with 
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minimal memory usage and low energy consumption to 
run Java programs. The performance requirements for 
existing processors using the fastest JITs more than 
double to support running the Java Virtual Machine in 
s software. The processor performance requirements 
could be met by employing superscalar processor archi- 
tectures or by increasing the processor clock frequency. 
In both cases, the power requirements are dramatically 
increased. The memory bloat that results from JIT tech- 
10 niques, also goes against the consumer application re- 
quirements of low cost and low power. 
[0008] It is desired to have an improved system for 
implementing Java programs that provides a low-cost 
solution for running Java programs for consumer appli- 
es ances. 

Summary of the Invention 

[0009] The present invention generally relates to Java 

20 hardware accelerators used to translate Java bytecodes 
into native instructions for a central processing unit 
(CPU). One embodiment of the present invention com- 
prises a reissue buffer, the reissue buffer associated 
with a hardware accelerator and adapted to store con- 

25 verted native instructions issued to the CPU along with 
associated native program counter values. When the 
CPU returns from an interrupt the reissue buffer exam- 
ines the program counter to determine whether to reis- 
sue a stored native instruction value from the reissue 

30 buffer. In this way, returns from interrupts can be effi- 
ciently handled without reloading the hardware acceler- 
ator with the instructions to convert. 
[0010] Another embodiment of the present invention 
comprises a hardware accelerator to convert stacked- 

35 base instructions into register-based instructions native 
to a central processing unit. The hardware accelerator 
includes a native program counter monitor. The native 
program counter monitor checks whether the native pro- 
gram counter is within a hardware accelerator program 

40 counter range. When the hardware accelerator program 
counter is within the hardware accelerator program 
counter range, the hardware accelerator is enabled and 
converted native instructions are sent to the CPU from 
the hardware accelerator, the native program counter is 

45 not used to determine instructions to load from memory. 
[0011] In this manner, the hardware accelerator can 
spoof the native program counter to be within a certain 
range which corresponds to the program counter range 
in which the stacked-base instructions are stored. By 

50 monitoring the program counter, the hardware acceler- 
ator can always tell when it needs to be operating and 
needs to not operate. Thus if a interrupt occurs, causing 
the data program counter to move to a range outside of 
the hardware accelerator program counter range, there 

55 need be no explicit instruction to the hardware acceler- 
ator from the CPU handling the interrupt to stall the hard- 
ware accelerator. 

[001 2] Yet another embodiment of the present inven- 



tion comprises a hardware accelerator operably con- 
nected to a central processing unit, the hardware accel- 
erator adapted to convert stack-based instructions into 
register-based instructions native to the central 
processing unit. The hardware accelerator includes a 
microcode stage. The microcode stage includes microc- 
ode memory. The microcode memory output includes a 
number of fields, the fields including a first set of fields 
corresponding to native instruction fields and a control 
bit field which affects the interpretation of the first set of 
fields by the microcode controlled logic to produce a na- 
tive instruction. Use of a microcode portion allows the 
same general hardware accelerator architecture to work 
with a variety of central processing units. In a preferred 
embodiment, the microcode portion is separate from a 
decode portion. 

Brief Description of the Drawings 

[0013] The present invention may be further under- 
stood from the following description in conjunction with 
the drawings. 

Figure 1 is a diagram of the system of the parent 
invention including a hardware Java accelerator. 

Figure 2 is a diagram illustrating the use of the hard- 
ware Java accelerator of the parent invention. 

Figure 3 is a diagram illustrating some the details 
of a Java hardware accelerator of one embodiment 
of the parent invention. 

Figure 4 is a diagram illustrating the details of one 
embodiment of a Java accelerator instruction trans- 
lation in the system of the parent invention. 

Figure 5 is a diagram illustration the instruction 
translation operation of one embodiment of the par- 
ent invention. 

Figure 6 is a diagram illustrating the instruction 
translation system of one embodiment of the parent 
invention using instruction level parallelism. 

Figure 7 is a table of exception bytecodes for one 
embodiment of the parent invention. 

Figure 8 is a diagram of one embodiment of a hard- 
ware accelerator used with one embodiment of the 
present invention. 

Figure 9 is a diagram that illustrates the decode 
stage for use in the hardware accelerator of the 
present invention. 

Figure 10 is a diagram that illustrates one embodi- 
ment of an instruction decode unit used with the de- 
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code stage of Figure 9. 

Figure 11 is a diagram that illustrates one embodi- 
ment of a microcode stage for use with the embod- 
5 iment of Figure 8. 

Figure 12 is a diagram of a microcode address logic 
used with the microcode stage of Figure 11 . 

10 Figure 13 is a diagram of a native instruction com- 
poser unit used with the embodiment of Figure 11 . 

Figure 14 is a diagram of a register selection logic 
used with the native instruction composer unit of 
is Figure 13. 

Figure 15 illustrates a stack-and-variable-register 
manager of one embodiment of the present inven- 
tion. 

20 

Figure 16 illustrates a stack-and-variable-register 
manager of an alternate embodiment of the present 
invention. 

25 Figure 17 is a diagram of the native PC monitor 
used with one embodiment of the present invention. 

Figure 1 8 is a diagram of a reissue buffer used with 
one embodiment of the present invention. 

30 

Figures 19 and 20 are diagrams that illustrate the 
operation of one embodiment of the present inven- 
tion. 

35 Detailed Description of the Preferred Embodiments 

[001 4] Figures 1 -7 illustrate the operation of the par- 
ent application. 

[001 5] Figure 1 is a diagram of the system 20 showing 
40 the use of a hardware Java accelerator 22 in conjunction 
with a central processing unit 26. The Java hardware 
accelerator 22 allows part of the Java Virtual Machine 
to be implemented in hardware. This hardware imple- 
mentation speeds up the processing of the Java byte- 
45 codes. In particular, in a preferred embodiment, the 
translation of the Java bytecodes into native processor 
instructions is at least partially done in the hardware 
Java accelerator 22. This translation has been part of a 
bottleneck in the Java Virtual Machine when implement- 
so ed in software. In Figure 1 , instructions from the instruc- 
tion cache 24 or other memory is supplied to the hard- 
ware Java accelerator 22. If these instruction are Java 
bytecode, the hardware Java accelerator 22 can convert 
these bytecodes into native processor instruction which 
55 are supplied through the multiplexer 28 to the CPU. If a 
non-Java code is used, the hardware accelerator can 
be by-passed using the multiplexer 26. The Java stack 
includes the frame, the operand stack, the variables, 
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etc. 

[0016] The Java hardware accelerator can do some 
or all of the following tasks: 

1 . Java bytecode decode; 

2. identifying and encoding instruction level paral- 
lelism (ILP), wherever possible; 

3. translating bytecodes to native instructions; 

4. managing the Java stack on a register file asso- 
ciated with the CPU or as a separate stack; 

5. generating exceptions on instructions on prede- 
termined Java bytecodes; 

6. switching to native CPU operation when native 
CPU code is provided; 

7. performing bounds checking on array instruc- 
tions; and 

8. managing the variables on the register file asso- 
ciated with the CPU. 

[0017] In a preferred embodiment, the Java Virtual 
Machine functions of bytecode interpreter, Java register, 
and Java stack are implemented in the hardware Java 
accelerator The garbage collection heap and constant 
pool area can be maintained in normal memory and ac- 
cessed through normal memory referencing, in one em- 
bodiment, these functions are accelerated in hardware, 
e.g. write barrier. 

[0018] The major advantages of the Java hardware 
accelerator is to increase the speed in which the Java 
Virtual Machine operates, and allow existing native lan- 
guage legacy applications, software base, and develop- 
ment tools to be used. A dedicated microprocessor in 
which the Java bytecodes were the native instructions 
would not have access to those legacy applications. 
[0019] Although the Java hardware accelerator is 
shown in Figure 1 as separate from the central process- 
ing unit, the Java hardware accelerator can be incorpo- 
rated into a central processing unit. In that case, the cen- 
tral processing unit has a Java hardware accelerator 
subunit to translate Java bytecode into the native in- 
structions operated on by the main portion of the CPU. 
[0020] Figure 2 is a state machine diagram that shows 
the operation of one embodiment of the parent inven- 
tion. Block 32 is the power-on state. During power-on, 
the multiplexer 28 is set to bypass the Java hardware 
accelerator. In block 34, the native instruction boot-up 
sequence is run. Block 36 shows the system in the na- 
tive mode executing native instructions and by-passing 
the Java hardware accelerator. 
[0021] In block 38, the system switches to the Java 
hardware accelerator mode. In the Java hardware ac- 
celerator mode, Java bytecode is transferred to the Java 
hardware accelerator 22, converted into native instruc- 
tions then sent to the CPU for operation. 
[0022] The Java accelerator mode can produce ex- 
ceptions at certain Java bytecodes. These byte&odes 
are not processed by the hardware accelerator 22 but 
are processed In the CPU 26. As shown in block 40, the 



system operates in the native mode but the Java Virtual 
Machine is implemented in the accelerator which does 
the bytecode translation and handles the exception cre- 
ated in the Java accelerator mode. 

5 [0023] The longer and more complicated bytecodes 
that are difficult to handle in hardware can be selected 
to produce the exceptions. Figure 7 is a table showing 
one possible list of bytecodes which can cause excep- 
tions in a preferred embodiment. 

io [0024] Figure 3 is a diagram illustrating details of one 
embodiment of the Java hardware accelerator of the 
parent invention. The Java hardware accelerator in- 
cludes Java accelerator instruction translation hardware 
42. The instruction translation Unit 42 is used to convert 

is Java bytecodes to native instructions. One embodiment 
of the Java accelerator instruction translation hardware 
42 is described in more detail below with respect to Fig- 
ure4. This instruction translation hardware 42 uses data 
stored in hardware Java registers 44. The hardware 

20 Java Registers store the Java Registers defined in the 
Java Virtual Machine. The Java Registers contain the 
state of the Java Virtual Machine, affect its operation, 
and are updated at runtime. The Java registers in the 
Java virtual machine include the PC, the program coun- 
ts ter indicating what bytecode is being executed; Optop, 
a pointerto the top of the operand stack; Frame, a point- 
er to the execution environment of the current method; 
and Java variables (Vars) t a pointer to the first local var- 
iable available of the currently executing method. The 

30 virtual machine defines these registers to be a single 
32-bit word wide. The Java registers are also stored in 
the Java stack which can be implemented as the hard- 
ware Java stack 50 or the Java stack can be stored into 
the CPU associated register file. 

35 [0025] In a preferred embodiment, the hardware Java 
registers 44 can include additional registers for the use 
of the instruction translation hardware 42. These regis- 
ters can include a register indicating a switch to native 
instructions configuration and control registers and a 

40 register indicating the version number of the system. 
[0026] The Java PC can be used to obtain bytecode 
instructions from the instruction cache 24 or memory. In 
one embodiment the Java PC is multiplexed with the 
normal program counter 64 of the central processing 

45 unit 26 in multiplexer 52. The normal PC 54 is not used 
during the operation of the Java hardware bytecode 
translation. In another embodiment, the normal program 
counter 54 is used as the Java program counter. 
[0027] The Java registers are a part of the Java Virtual 

so Machine and should not be confused with the general 
registers 46 or 48 which are operated upon by the cen- 
tral processing unit 26. In one embodiment, the system 
uses the traditional CPU registerf ile 46 as well as a Java 
CPU register file 48. When native code is being operat- 

55 ed upon the multiplexer 56 connects the conventional 
register file 46 to the execution logic 26c of the CPU 26. 
When the Java hardware accelerator is active, the Java 
CPU register file 48 substitutes for the conventional 



CPU register file 46. In another embodiment, the con- 
ventional CPU register file 46 is used. 
[0028] As described below with respect to Figures 3 
and 4, the Java CPU register file 48, or in an alternate 
embodiment the conventional CPU register file 46, can 
be used to store portions of the operand stack and some 
of the variables. In this way, the native register-based 
instructions from the Java accelerator Instruction trans- 
lator 42 can operate upon the operand stack and varia- 
ble values stored in the Java CPU register file 48, or the 
values stored in the conventional CPU register file 46. 
Data can be written in and out of the Java CPU register 
file 48 from the data cache or other memory 58 through 
the overflow/underflow line 60 connected to the memory 
arbiter 62 as well as issued load/store instructions. The 
overflow/underflow transfer of data to and from the 
memory can be done concurrently with the CPU opera- 
tion. Alternately, the overflow/underflow transfer can be 
done explicitly while the CPU is not operating. The over- 
flow/underflow bus 60 can be implemented as a tri-state 
bus or as two separate buses to read data in and write 
data out of the register file when the Java stack over- 
flows or underflows. 

[0029] The register files for the CPU could alternately 
be implemented as a single register file with native in- 
structions used to manipulate the loading of operand 
stack and variable values to and from memory. Alter- 
nately, multiple Java CPU register files could be used: 
one register file for variable values, another register file 
for the operand stack values, and another register file 
for the Java frame stack holding the method environ- 
ment information. 

[0030] The Java accelerator controller (co-processing 
unit) 64 can be used to control the hardware Java ac- 
celerator, read in and out from the hardware Java reg- 
isters 44 and Java stack 50, and flush the Java accel- 
erator instruction translation pipeline upon a "branch 
taken" signal from the CPU execute logic 26c. 
[0031] The CPU 26 is divided into pipeline stages in- 
cluding the instruction fetch 26a, instruction decode 
26b, execute logic 26c, memory access logic 26d, and 
writeback logic 26e. The execute logic 26c executes the 
native instructions and thus can determine whether a 
branch instruction is taken and issue the "branch taken" 
signal. In one embodiment, the execute logic 26c mon- 
itors addresses for detecting branches. Figure 4 illus- 
trates an embodiment of a Java accelerator instruction 
translator which can be used with the parent invention. 
The instruction buffer 70 stores the bytecode instruc- 
tions from the instruction cache. The bytecodes are sent 
to a parallel decode unit 72 which decodes multiple byte- 
codes at the same time. Multiple bytecodes are proc- 
essed concurrently in order to allow for instruction level 
parallelism. That is, multiple bytecodes may be convert- 
ed into a lesser number of native instructions. 
[0032] The decoded bytecodes are sent \6 a state ma- 
chine unit 74 and Arithmetic Logic Unit (ALU) 76. The 
ALU 76 is provided to rearrange the bytecode instruc- 
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tions to make them easier to be operated on by the state 
machine 74 and perform various arithmetic functions in- 
cluding computing memory references. The state ma- 
chine 74 converts the bytecodes into native instructions 

5 using the lookup table 78. Thus, the state machine 74 
provides an address which indicates the location of the 
desired native instruction in the microcode look-up table 
78 . Counters are maintained to keep a count of how 
many entries have been placed on the operand stack, 

10 as well as to keep track of and update the top of the 
operand stack in memory and in the register file. In a 
preferred embodiment, the output of the microcode 
look-up table 78 is augmented with indications of the 
registers to be operated on in the native CPU register 

is file at line 80. The register indications are from the 
counters and interpreted from bytecodes. To accom- 
plish this, it is necessary to have a hardware indication 
of which operands and variables are in which entries in 
the register file. Native Instructions are composed on 

20 this basis. Alternately, these register indications can be 
sent directly to the Java CPU register file 48 shown in 
Figure 3. 

[0033] The state machine 74 has access to the Java 
registers in 44 as well as an indication of the arrange- 
rs ment of the stack and variables in the Java CPU register 
file 48 or in the conventional CPU register file 46. The 
buffer 82 supplies the translated native instructions to 
the CPU. 

[0034] The operation of the Java hardware accelera- 
te tor of one embodiment of the parent invention is illus- 
trated in Figures 5 and 6. Figure 5, section I shows the 
instruction translation of the Java bytecode. The Java 
bytecode corresponding to the mnemonic iadd is inter- 
preted by the Java virtual machine as an integer oper- 
as ation taking the top two values of the operand stack, 
adding them together and pushing the result on top of 
the operand stack. The Java translating machine trans- 
lates the Java bytecode into a native instruction such as 
the instruction ADD R1 , R2. This is an instruction native 
40 to the CPU indicating the adding of value in register R1 
to the value in register R2 and the storing of this result 
in register R2 . R1 and R2 are the top two entries in the 
operand stack. 

[0035] As shown in Figure 5, section II, the Java reg- 
45 ister includes a PC value of "Value A" that is increment- 
ed to "Value A+1°. The Optop value changes from "Val- 
ue B" to "Value B-1 " to indicate that the top of the oper- 
and stack is at a new location. The Vars base value 
which points to the start of the variable list is not modi- 
so tied. In Figure 5, section III, the contents of a native CPU 
register file or a Java CPU register file, 48 in Figure 3, 
is shown. The Java CPU register file starts off with reg- 
isters R0-R5 containing operand stack values and reg- 
isters R6-R7 containing variable values. Before the op- 
55 eration of the native instruction, register R1 contains the 
top value of the operand stack. Register R6 contains the 
first variable. Hardware is used to detect the availability 
of the Vars in the register file. If the Var is not available, 
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the hardware in conjunction with microcode issue load 
instructions to the native CPU. Once the value of a Var 
has been updated in the RF, that entry is marked as be- 
ing modified so when doing method calls, only the up- 
dated Vars in memory are written back to memory. This $ 
results in significantly higher performance methods. Af- 
ter the execution of the native instruction, register R2 
now contains the top value of the operand stack. Reg- 
ister R1 no longer contains a valid operand stack value 
and is available to be overwritten by a operand stack 10 
value. 

[0036] Figure 5, section IV, shows the memory loca- 
tions of the operand stack and variables which can be 
stored in the data cache 58 or in main memory. For con- 
venience, the memory is illustrated without illustrating w 
any virtual memory scheme. Before the native Instruc- 
tion executes, the address of the top of the operand 
stack, Optop, is "Value B". After the native instruction 
executes, the address of the top of the operand stack is 
"Value B-1" containing the result of the native instruc- 20 
tion. Note that the operand stack value "4427" can be 
written into register R1 across the overflow/underflow 
line 60. Upon a switch back to the native mode, the data 
in the Java CPU register file 48 should be written to the 
data memory. 25 
[0037] Consistency must be maintained between the 
Hardware Java Registers 44, the Java CPU register file 
48 and the data memory. The CPU 26 and Java Accel- 
erator Instruction Translation Unit 42 are pipelined and 
any changes to the hardware java registers 44 and 30 
changes to the control information for the Java CPU reg- 
ister file 48 must be able to be undone upon a "branch 
taken" signal. The system preferably uses buffers (not 
shown) to ensure this consistency. Additionally, the Java 
instruction translation must be done so as to avoid pipe- 35 
line hazards in the instruction translation unit and CPU. 
[0038] Figure 6 is a diagram illustrating the operation 
of instruction level parallelism with the parent invention. 
In Figure 6 the Java bytecodes iloadj\ and iaddare con- 
verted by the Java bytecode translator to the single na- 40 
tive instruction ADD R6, R1. In the Java Virtual Machine, 
iload_n pushes the top local variable indicated by the 
Java register Var onto the operand stack. 
[0039] In the parent invention the Java hardware 
translator can combine the iload_n and iadd bytecode 4s 
into a single native instruction. As shown in figure 6, sec- 
tion II, the Java Register, PC, is updated from "Value A" 
to "Value A+2". The Optop value remains "value B". The 
value Var remains at "value C". 

[0040] As shown in Figure 6, section III, after the na- so 
tive instruction ADD R6, R1 executes the value of the 
first local variable stored in register R6, "1221 ", is added 
to the value of the top of the operand stack contained in 
register R1 and the result stored in register R1 . In Figure 
6, section IV, the Optop value does not change but the 55 
value in the top of the register contafns the result of the 
ADD instruction, 1 371 . This example shows the present 
invention operating with a native CPU supporting only 
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two operands. The invention can also support three op- 
erands and Very Long Instruction Word (VLIW) CPU's. 
[0041] For some byte codes such asSiPush, BiPush, 
etc., the present invention makes available sign extend- 
ed data for the immediate field of the native instruction 
being composed (120) by the hardware and microcode. 
This data can alternatively be read as a coprocessor 
register. The coprocessor register read/write instruction 
can be issued by hardware accelerator as outlined in 
the present invention. Additionally, the microcode has 
several fields that aid in composing the native instruc- 
tion. 

[0042] The Java hardware accelerator of the parent 
invention is particularly well suited to a embedded solu- 
tion in which the hardware accelerator is positioned on 
the same chip as the existing CPU design. This allows 
the prior existing software base and development tools 
for legacy applications to be used. In addition, the archi- 
tecture of the present embodiment is scalable to fit a 
variety of applications ranging from smart cards to desk- 
top solutions. This scalability is implemented in the Java 
accelerator instruction translation unit of Figure 4. For 
example, the lookup table 78 and state machine 74 can 
be modified for a variety of different CPU architectures. 
These CPU architectures include reduced instruction 
set computer (RISC) architectures as well as complex 
instruction set computer (CISC) architectures. The 
present invention can also be used with superscalar 
CPUs or very long instruction word (VLIW) computers. 
[0043] Figures 8-20 illustrate the operation of the 
present invention. Figure 8 is a diagram that shows a 
system 1 00 of one embodiment of the present invention. 
The system includes a CPU 1 01 and a hardware accel- 
erator. The hardware accelerator portion includes a de- 
code stage 1 02 for receiving the Java bytecode from the 
memory. Decode stage 102 preferably uses instruction 
level parallelism in which more than one Java bytecode 
can be converted into a single native instruction. In a 
preferred embodiment, the system 100 includes a mi- 
crocode stage 1 04 which receives signals from the de- 
code stage 102 and is used to construct the native in- 
structions. The microcode stage 1 04 allows for the pro- 
duction of multiple native instructions from a single byte- 
code. The reissue buffer 106 stores a copy of the con- 
verted instructions in the reissue buffer 1 06 as they are 
sent to the CPU 101. 

[0044] The reissue buffer 1 06 monitors the native PC 
value 110. In a preferred embodiment, when the hard- 
ware accelerator is active, the hardware accelerator 
does not use the native PC value to determine the mem- 
ory location to load the instructions from memory. The 
native PC value is instead maintained within a spoofed 
range which indicates that the hardware accelerator is 
active. In a preferred embodiment, the native PC mon- 
itor 110 detects whether the native PC value is within 
the spoofed range. If so, the multiplexer 112 sends the 
converted instructions from the hardware accelerator to 
the CPU 1 01 . If not, the native instructions from memory 
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are loaded to the CPU 1 01 . When in the spoofed range, 
the addresses sourced to memory are the Java PC from 
the accelerator. Otherwise the native PC is sourced to 
memory. 

[0045] If an interrupt occurs, the native PC value will 
goto a value outside the spoofed range. The PC monitor 
110 will then stall the hardware accelerator. When a re- 
turn from interrupt occurs, the CPU 1 01 will be flushed, 
and upon return from interrupt, the native PC value 1 08 
returned to the PC value prior to the interrupt. The reis- 
sue buffer 106 will then reissue stored native instruc- 
tions flushed from CPU 101 to the CPU 101 that corre- 
sponds to this prior native PC value. With the use of this 
system, the hardware accelerator does not need to be 
flushed upon an interrupt, nor do previously converted 
Java bytecodes need to be reloaded into the hardware 
accelerator. The use of the reissue buffer 106 can thus 
speed the operation and recovery from interrupt 
[0046] The CPU 1 01 is associated with a register file 
11 3. This register file is the native CPU's normal register 
file, operably connected to the CPU's ALU but is shown 
separately here for illustration. The register file 113 
stores Stack and Var values which can be used by the 
converted instructions. The Stack- and Variable-manag- 
ers 11 4 keep track of any information stored in the reg- 
ister file 113 and use it to help the microcode stage op- 
erations. As described below, in one embodiment there 
are a fixed number of registers used for Stack values 
and Variable value. For example, six registers can be 
used for the top six Stack values and six registers used 
for six Variable values. 

[0047] In another embodiment of the present inven- 
tion, the Stack and Variable manager assigns Stack and 
Variable values to different registers in the register file. 
An advantage of this alternate embodiment is that in 
some cases the Stack and Var values may switch due 
to an Invoke Call and such a switch can be more effi- 
ciently done in the Stack and Var manager 114 rather 
than producing a number of native instructions to imple- 
ment this. 

[0048] In one embodiment a number of important val- 
ues can be stored in the hardware accelerator to aid in 
the operation of the system. These values stored in the 
hardware accelerator help improve the operation of the 
system, especially when the register files of the CPU 
are used to store portions of the Java stack. 
[0049] The hardware accelerator preferably stores an 
indication of the top of the stack value. This top of the 
stack value aids in the loading of stack values from the 
memory. The top of the stack value is updated as in- 
structions are converted from stack-based instructions 
to register-based instructions. When instruction level 
parallelism is used, each stack-bases instruction which 
is part of a single register-based instruction needs to be 
evaluated for its effects on the Java stack. 
[0050] In one embodiment, v an operand stack depth 
value is maintained in the hardware accelerator. This 
operand stack depth indicates the dynamic depth of the 



847 A2 




operand stack in the CPU's register files. Thus, if four 
stack values are stored in the register files, the stack 
depth indicator will read n 4." Knowing the depth of the 
stack in the register file helps in the loading and storing 
5 of stack values in and out of the register files. 

[0051] In a preferred embodiment, a minimum stack 
depth value and a maximum stack depth value are main- 
tained within the hardware accelerator. The stack depth 
value is compared to the maximum and minimum stack 
10 depths. When the stack value goes below the minimum 
value, the hardware accelerator composes load instruc- 
tions to load stack values from the memory into the reg- 
ister file of the CPU. When the stack depth goes above 
the maximum value, the hardware accelerator compos- 
es es store instructions to store stack values back out to 
the memory. 

[0052] In one embodiment, at least the top four (4) en- 
tries of the operand stack in the CPU register file oper- 
ated as a ring buffer, the ring buffer maintained in the 
20 accelerator and operably connected to a overflow/un- 
derflow unit. 

[0053] The hardware accelerator also preferably 
stores an indication of the operands and variables 
stored in the register file of the CPU. These indications 
25 allow the hardware accelerator to compose the convert- 
ed register-based instructions from the incoming stack- 
based instructions. 

[0054] The hardware accelerator also preferably 
stores an indication of the variable base and operand 

30 base in the memory. This allows for the composing of 
instructions to load and store variables and operands 
between the register file of the CPU and the memory. 
For example, When a Var is not available in the register 
file, the hardware issues load instructions. The hard- 

35 ware adapted to multiply the Var number by four and 
adding the Var base to produce the memory location of 
the Var. The instruction produced is based on knowl- 
edge that the Var base is in a temporary native CPU 
register. The Var number times four can be made avail- 

40 able as the immediate field of the native instruction be- 
ing composed, which may be a memory access instruc- 
tion with the address being the content of the temporary 
register holding a pointer to the Vans base plus an im- 
mediate offset. Alternatively, the final memory location 

45 of the Var may be read by the CPU as an instruction 
saved by the accelerator and then the Var can be load- 
ed. 

[0055] In one embodiment, the hardware accelerator 
marks the variables as modified when updated by the 
so execution of Java bytecodes. The hardware accelerator 
can copy variables marked as modified to the system 
memory for some bytecodes. 
[0056] In one embodiment, the hardware accelerator 
composes native instructions wherein the native instruc- 
ts tions operands contains at least two native CPU register 
file references where the register file contents are the 
data for the operand stack and variables. 
[0057] Figure 9 illustrates a decode stage of one em- 
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bodiment of the present invention. This decode stage 
102' is divided into the prefetch stage 116 and the in- 
struction decode 118. The prefetch stage 116 includes 
a bytecode buffer and alignment prefetch stage unit 1 20 
which receives the raw bytecodes from a memory (not 
shown). The Java bytecode buffer control element 122 
provides instructions to determine when to load addi- 
tional bytecodes from the memory. The address unit 1 24 
uses the Java program counter 1 26 to determine the lo- 
cation of the next bytecode to load. As described above, 
while the hardware accelerator is active, the Java pro- 
gram counter is used to get the next word from memory 
containing Java bytecode. The native PC is maintained 
within a spoofed region and is not used to get the next 
instruction while the hardware accelerator is active. The 
bytecode buffer alignment unit 120 contains a number 
of bytecodes from the memory. When the instructions 
are passed on from the instruction decode unit 118, a 
number of bytes are removed from the bytecode buffer 
alignment unit 120. A signal on line 128 indicates the 
number of bytecodes which are used by the instruction 
decode unit 118. In one embodiment, the decoded data 
on line 1 30 is sent to the microcode stage. This data can 
include the microcode Start Address data 130a, Index/ 
Address and Vans data 1 30b, and Var Control data 1 30c. 
[0058] Figure 10 shows an instruction decode unit 
1 1 8'. In this embodiment, a number of bytes are sent to 
an Instruction Decode unit. Individual Decode units 132, 
134, 136, 138 and 140 receive and decode the bytes. 
Note that the value of adjacent bytes affects how the 
byte is decoded. For example, if byte A is the start of a 
two-byte instruction, the value of byte B is interpreted 
as the second half of the two-byte instruction. The in- 
struction level parallelism logic 142 receives the decod- 
ed information and then determines the microcode start 
address for the primary bytecode. Secondary byte 
codes can be combined with the primary bytecode by 
the selection of registers accessed by the converted in- 
struction. One example of this embodiment is described 
below with respect to Figures 19 and 20. 
[0059] The accelerator ALU 144 is used to calculate 
index addresses and the like. The accelerator ALU is 
connected to the register pool. The use of the acceler- 
ator ALU allows certain simple calculations to be moved 
from the CPU unit to the hardware accelerator unit, and 
thus allows the Java bytecodes to be converted into few- 
er native instructions. The Variable Selection + Other 
Control unit 146 determines which registers are used as 
Vars. The Var control line from the ILP Logic unit 142 
indicates how these Vars are interpreted. A Var and as- 
sociated Var control line can be made available for each 
operand field in the native CPU's instruction. 
[0060] In one embodiment, the hardware accelerator 
issues native load instructions when a variable is not 
present in the native CPU register file, the memory ad- 
dress being computed by the ALU in the hardware ac- 
celerator. 

[0061 ] The microcode stage 1 04' shown in Figure 1 1 



includes a microcode address logic 148 and microcode 
memory 150. The microcode address logic sends mi- 
crocode addresses to the microcode memory 150. The 
microcode memory 1 50 then sends the contents of that 

5 address to the Native Instruction Composer Logic 152 
which produces the native instruction. Each microcode 
memory line includes a main instruction portion on line 
154, control bits on line 156 and update stack pointer 
bits on line 158. Both the microcode address logic 148 

10 and the microcode 150 can produce a string of native 
instructions until the update stack Bit is sent to the mi- 
crocode address logic 148. At that point, the microcode 
address logic obtains another start address from the de- 
code logic (not shown). The native instruction composer 

15 receives the main instruction portion on line 154, the 
control bits from the decode, the index address, Vars, 
and the Var controls. These inputs allow the native in- 
struction composer 152 to construct the native instruc- 
tions which are sent to the reissue buffer and the native 

20 CPU. 

[0062] Figure 12 shows a microcode address logic 
148' of one embodiment of the present invention. Start 
address coming from the decode logic goes to multiplex- 
er 154. The multiplexer 154 can either send the start 

25 address or an incremental or calculated value to the mi- 
crocode RAM. In a preferred embodiment, while the up- 
date stack bit is not set, the address of the next element 
in the microcode is calculated by the ALU 156 and pro- 
vided to the mu Itiplexer 1 54 for sending to the microcode 

30 memory (not shown). Space in the microcode RAM 
memory can be conserved by including jumps to other 
areas of the microcode memory. These jumps can be 
done by calculation in unit 158 or by providing the ad- 
dress on line 1 60. 

35 [0063] Figure 1 3 illustrates an embodiment of a native 
instruction composer unit for use with the present inven- 
tion. In this embodiment a number of register selection 
logic units 1 62, 1 64 and 1 66 are provided. Each register 
selection logic unit can be used to select a register used 

40 with a native instruction. Special resources logic unit 
1 68 and selection logic 1 70 allow the selection of special 
instructions. 

[0064] Figure 14 shows the register selection logic 
16V of one embodiment of the present invention. The 

45 register determination logic 172 determines from the 
variable control bits, the microcode control bits and the 
Stack and Vars register manager information which reg- 
ister to use. For example, if the instruction is to load the 
top of stack and then use this top of stack value in next 

so bytecode register determination logic 172 can be used 
to determine that register R1 0 contains the top of stack 
value and so Register R10 is used in the converted in- 
struction. 

[0065] Register remapping unit 174 does register 
55 remapping. In conventional CPUs, some registers are 
reserved. Register remapping unit 174 allows the de- 
coder logic to assume that the Stack and Var registers 
are virtual, which simplifies the calculations. Multiplexer 
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176 allows the value on line 171 to be passed without 
being modified. 

[0066] Figure 1 5 illustrates an embodiment of a stack- 
and-variable-register manager 114'. The stack-and-var- 
iable-register manager maintains indications of what Is 
stored in the variable and stack registers of the register 
file of the CPU. This information is then provided to the 
decode stage and microcode stage in order to help in 
the decoding of the Java bytecode and generating ap- 
propriate native instructions. 
[0067] In a preferred embodiment, one of the func- 
tions of the Stack-and-Var register manager is to main- 
tain an indication of the top of the stack. Thus, if for ex- 
ample registers R1 -R4 store the top 4 stack values from 
memory or by executing byte codes, the top of the stack 
will change as data is loaded into and out of the register 
file. Thus, register R2 can be the top of the stack and 
register R1 be the bottom of the stack in the register file. 
When a new data is loaded into the stack within the reg- 
ister file, the data will be loaded into register R3, which 
then becomes the new top of the stack, the bottom of 
the stack remains R1 . With two more Items loaded on 
the stack in the register file, the new top of stack In the 
register file will be R1 but first R1 will be written back to 
memory by the accelerators overflow/underlfow unit, 
and R2 will be the bottom of the partial stack in the CPU 
register file 

[0068] Figure 16 illustrates an alternate stack-and- 
variable-register manager 1 1 4°. In this alternate embod- 
iment, a register assignment table 172 is maintained. 
The register assignment table maintains an indication 
of which Vars and stack variables are stored in which 
registers. When an instruction is decoded it is checked 
whether a Var or stack value is stored in the register file 
using the register assignment table 172. If there is a 
match to the incoming stack or Var value, the values 
within the register file of the CPU are used. If there is no 
match, the value can be loaded into the register file from 
the memory and the register assignment table updated. 
In one embodiment, an invoke assignment logic unit 1 74 
is operably conected with the register assignment table. 
When an invoke occurs, typically the values of some of 
the stack and the Vars are switched. By reassigning the 
values within the register assignment table 172 using 
reassignment logic 1 74, the operation of the invoke can 
be speeded up. 

[0069] Figure 17 shows one embodiment of a native 
PC monitor 110'. The native PC value Is compared to a 
high range register and a low range register. If the native 
PC value is within this range, the hardware accelerator 
is enabled using line 178. Otherwise the hardware ac- 
celerator is disabled. The element 1 80 tests whether the 
native PC value is coming close to the high end of the 
spoof range. If so, the system induces a jump to a lower 
value of the native PC unit. 

[0070] Figure 18 illustrates an embodiment of a reis- 
sue buffer 1 06\ The reissue buffer receives the convert- 
ed instructions and stores them along with the associ- 
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ated native PC value. As long as there is no interrupt, 
the native PC value will continue to increment, and the 
next instruction and current native PC is stored in the 
reissue buffer and instruction issued to the CPU. When 

5 an interrupt occurs, the CPU pipeline is flushed, includ- 
ing non-executed instructions, of which there is a copy 
in the reissue buffer. When a return from an interrupt 
occurs, the CPU is flushed and the native PC value be- 
fore the interrupt is restored. This restored native PC 

10 value matches a native PC stored in the PC value store 
1 64, causing a buffered instruction in the old instruction 
store 1 86 to be provided to the CPU. The old instruction 
store and the PC value store are synchronized. Once 
all of the old instructions are provided to the CPU 102, 
the native PC value will be outside of the range of all of 
the old PC values in store 184, and new converted in- 
structions will be provided. The depth of the reissue buff- 
er depends upon the number of pipeline stages in the 
CPU 1 02 (not shown). Under certain conditions such as 

20 branches, the reissue buffer is flushed. As described 
above, the reissue buffer eases the operation of the 
hardware accelerator. The hardware accelerator need 
not know the details of the return from interrupt opera- 
tion of the CPU. Thus the hardware accelerator can op- 

25 erate with a variety of different CPUs without requiring 
major modification of the hardware accelerator architec- 
ture. Changes to the microcode stage are sufficient to 
change the hardware accelerator so that it could be 
used with different CPUs. 

30 [0071] Figures 19 and 20 illustrate the operation of 
one embodiment of the system of the present invention. 
In Figure 19, multiple instructions are shown being re- 
ceived by the decoder stage. The top two instructions 
are integer loads and the bottom instruction is an integer 

35 add. The ideal combination of these bytecodes by the 
system would be the main op code being an add and 
the two loads combined together. The system tests 
whether each of the Vars is in memory. In this example, 
the iioadM is not a Var which is stored in memory. Thus 

40 the value of the Var31 needs to be loaded from memory 
into a free register. In this example, the Var base stored 
in the stack manager is loaded into temp register R10. 
The word is put into the top of the stack, or in this case 
in the register file indicating the top of the stack. 

45 [0072] Figure 20 illustrates an example when /7oacf_3 
and HoadSare used. In this example, both of these Vars 
are stored within the register file. Thus, the add can be 
combined with the two loads. In this example, VarH is 
indicated as being a 3, VarL is indicated as being a 5. 

so The op type is indicated as being iadd. The Var HControl 
and VarL Control indicate that the Vars are load types 
and in the register file. The top of the stack modification 
is + 1 . This is because two values are loaded upon the 
stack for the two loads, and one value is removed from 

55 the stack as a result of the main add operation. 

[0073] In actuality, as can be understood with respect 
to the figures described above, the Var 3 and Var 5 are 
already stored within the two register files. The value of 
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these register files is determined by the system. The in- 
structions iload 3, iload 5 and iadd are done by deter- 
mining which two registers store Var 3 and Var 5 and 
also determining which register is to store the new top 
of the stack. If Var 3 is stored in register R9 and Var 5 5 
is stored in register R1 1 and the top of the stack is to be 
stored in register R2, the converted native instruction is 
an add of the value within register R9 to the value within 
register R11 and store the value into register R2. This 
native instruction thus does the operation of three byte- 10 
codes at the same time, resulting in the instruction level 
parellelism as operated on a native CPU. 
[0074] Additionally within the hardware accelerator a 
ALU is deployed where the decoded byte code instruc- 
tions for bytecodes such as GOTO and GOTO_W, the *5 
immediate branch offset following the bytecode instruc- 
tion is sign extended and added to the Java PC of the 
current bytecode instruction and the result is stored in 
the Java PC register. JSR and JSR _W bytecode instruc- 
tions also do this in addition to pushing the Java PC of 20 
the next byte code instruction on the operand stack. 
[0075] The Java PC is incremented by a value calcu- 
lated by the hardware accelerator. This increment value 
is based on the number of bytes being disposed of dur- 
ing the current decode which may include more than one 25 
byte code due to ILP. Similarly, SiPush and BiPush in- 
structions are also sign extended and made available in 
the immediate field of the native instruction being com- 
posed. In some processors, the immediate field of the 
native instruction has a smaller bit width than is desired 30 
for the offsets or sign extended constants so this data 
may be read as memory mapped or I/O mapped reads. 
[0076] While the present invention has been de- 
scribed with reference to the above embodiments, this 
description of the preferred embodiments and methods 35 
is not meant to be construed in a limiting sense. For ex- 
ample, the term Java in the specification or claims 
should be construed to cover successor programming 
languages or other programming languages using basic 
Java concepts (the use of generic instructions, such as 40 
bytecodes, to indicate the operation of a virtual ma- 
chine). It should also be understood that all aspects of 
the present invention are not to be limited to the specific 
descriptions, or to configurations set forth herein. Some 
modifications in form and detail the various embodi- 45 
ments of the disclosed invention, as well as other vari- 
ations in the present invention, will be apparent to a per- 
son skilled in the art upon reference to the present dis- 
closure. It is therefore contemplated that the following 
claims will cover any such modifications or variations of 50 
the described embodiment as falling within the true spirit 
and scope of the present invention. 



Claims 55 

1 . A system comprising: 
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a pipelined central processing unit with associ- 
ated native program counter; and 
a hardware accelerator operably connected to 
the central processing unit, the hardware accel- 
erator adapted to convert stack-based instruc- 
tions into register-based instructions native to 
the central processing unit, the hardware accel- 
erator including a reissue buffer, the reissue 
buffer adapted to store converted native in- 
structions issued to the CPU along with an in- 
dication of the order of the instructions, the sys- 
tem is such that when the CPU returns from an 
interrupt, the reissue buffer examines the indi- 
cation to determine whether to reissue a stored 
native instruction value. 

2. A system comprising: 

a central processing unit; and 
a hardware accelerator operably connected to 
the central processing unit, the hardware accel- 
erator adapted to convert stack-based instruc- 
tions into register-based instructions native to 
the central processing unit, the hardware accel- 
erator including a microcode stage, the microc- 
ode stage including a microcode memory, the 
microcode memory output including a number 
of fields, the fields including a first set of fields 
corresponding native instruction fields and con- 
trol bits field that affects the interpretation of the 
first set of fields by microcode controlled logic 
to produce a native instruction. 

3. A system comprising: 

a central processing unit; and 
a hardware accelerator operably connected to 
the central processing unit, the hardware accel- 
erator adapted to receive stack-based instruc- 
tions, the hardware accelerator including a mi- 
crocode generating unit adapted to receive 
stack-based instructions and to produce there- 
from microcode instructions, the hardware ac- 
celerator also including microcode interpreta- 
tion logic adapted to receive the microcode and 
to produce therefrom native instructions which 
are sent to the central processing unit. 

4. A system comprising: 

a central processing unit; and 
a hardware accelerator operably connected to 
the central processing unit, the hardware accel- 
erator adapted to convert stack-based instruc- 
tions into register-based instructions native to 
the central processing unit, the hardware accel- 
erator storing an indication of the top of oper- 
and stack pointer, the top of operand stack be- 
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ing stored and updated in hardware, wherein 
when more than one stack-based instruction is 
translated into a single register-based instruc- 
tion, the top of stack pointer is modified so as 
to reflect the effects of each register-based in- 
struction, stack based instruction and instruc- 
tion level parallelism. 

5. A system comprising: 

a central processing unit with associated regis- 
ter file; and 

a hardware accelerator operably connected to 
the central processing unit, the hardware accel- 
erator adapted to convert stack-based instruc- 
tions into register-based instructions native to 
the central processing unit, the hardware accel- 
erator storing an indication of the depth count 
of the portion of the operand stack stored in the 
central processing units register file, the depth 
count being updated during the translation 
process. 

6. A system comprising: 

a central processing unit; and 
a hardware accelerator operably connected to 
the central processing unit, the hardware accel- 
erator adapted to convert stack-based instruc- 
tions into register-based instructions native to 
the central processing unit, the hardware accel- 
erator storing an indication of the depth count 
of the portion of the operand stack stored in the 
central processing units register file, the depth 
count being updated during the translation 
process, the hardware accelerator checking to 
see if the stack depth is below a minimum or 
above a maximum depth, wherein if the depth 
is below the minimum depth the hardware ac- 
celerator generates load Instructions to load 
operand stack data from external memory to 
the register file, and wherein if the depth is 
above the maximum depth the hardware accel- 
erator generates store instructions to move op- 
erand stack data from register file to the exter- 
nal memory . 



20 



processing unit, the stored indications being 
used during the conversion process and being 
updated by the hardware accelerator. 



5 8. A system comprising: 
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a central processing unit with associated regis- 
ter file; and 

a hardware accelerator operably connected to 
the central processing unit, the hardware accel- 
erator adapted to convert stack-based instruc- 
tions into register-based instructions native to 
the central processing unit, the hardware accel- 
erator storing at least the top four (4) entries of 
the operand stack in the native CPU register 
file as a ring buffer, the ring buffer maintained 
in the accelerator and operably connected to a 
overflow/underflow unit. 



25 
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20 9. a system comprising: 



a central processing unit with associated regis- 
ter file; and 

a hardware accelerator operably connected to 
the central processing unit, the hardware accel- 
erator adapted to convert stack-based instruc- 
tions into register-based instructions native to 
the central processing unit, the hardware accel- 
erator storing Java variables in the native CPU 
register file and an indication of which variables 
are in the native CPU register file. 

10. A system comprising: 

a central processing unit with associated regis- 
ter file; and 

a hardware accelerator operably connected to 
the central processing unit, the hardware accel- 
erator adapted to convert stack-based instruc- 
tions into register-based instructions native to 
the central processing unit, where the hardware 
accelerator composes native instructions 
based on the availability of variables and oper- 
ands in the native CPU register file. 



35 
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11. A system comprising: 



7. A system comprising: 

a central processing unit with associated regis- so 
terfile; and 

a hardware accelerator operably connected to 
the central processing unit, the hardware accel- 
erator adapted to convert stack-based instruc- 
tions into register-based instructions native to 55 
' the central processing unit, the hardware accel- 
erator storing an indication of the operands and 
variables stored in the register file of the central 



a central processing unit with associated regis- 
ter file; and 

a hardware accelerator operably connected to 
the central processing unit, the hardware accel- 
erator adapted to convert stack-based instruc- 
tions into register-based instructions native to 
the central processing unit, where the hardware 
accelerator marks the variables in the native 
CPU register file as modified when>updated by 
the execution of Java byte codes. 
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12. A system comprising: 



a central processing unit with associated regis- 
ter file; and 

a hardware accelerator operably connected to 
the central processing unit, the hardware accel- 
erator adapted to convert stack-based instruc- 
tions into register-based instructions native to 
the central processing unit, where the hardware 
accelerator issues native load instructions 
when a variable is not present in the native CPU 
register file, the memory address being com- 
puted by an ALU in the hardware accelerator. 

13. A system comprising: 

a central processing unit with associated regis- 
ter file; and 

a hardware accelerator operably connected to 
the central processing unit, the hardware accel- 
erator adapted to convert stack-based instruc- 
tions into register-based Instructions native to 
the central processing unit, where the hardware 
accelerator composes native instructions 
wherein the native instructions operands con- 
tains at least two native CPU register file refer- 
ences where the register file contents are the 
data for the operand stack and variables. 

14. A system comprising: 

r 

a central processing unit with associated regis- 
ter file; and 

a hardware accelerator operably connected to 
the central processing unit, the hardware accel- 
erator adapted to convert stack-based instruc- 
tions into register-based instructions native to 
the central processing unit, where the hardware 
accelerator generates a new Java PC due to a 
"GOTO" or "GOTO_W byte code. 

15. A system comprising: 

a central processing unit with associated regis- 
ter file; and 

a hardware accelerator operably connected to 
the central processing unit, the hardware accel- 
erator adapted to convert stack-based instruc- 
tions into register-based instructions native to 
the central processing unit, where the hardware 
accelerator generates a new Java PC due to a 
"JSR" or "JSR_W" byte code, computes the re- 
turn Java PC and pushes the return Java PC 
on to the operand stack. 

16. A system comprising: 

a central processing unit with associated regis- 
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terfile; and 

a hardware accelerator operably connected to 
the central processing unit, the hardware accel- 
erator adapted to convert stack-based instruc- 
tions into register-based instructions native to 
the central processing u nit, where the hardware 
accelerator sign extends the SiPush and Bi- 
push byte codes and appends to the immediate 
filed of the native instruction being composed. 

17. A system comprising: 

a central processing unit with associated regis- 
ter file; and 

a hardware accelerator operably connected to 
the central processing unit, the hardware accel- 
erator adapted to convert stack-based instruc- 
tions into register-based instructions native to 
the central processing unit, where the hardware 
accelerator sign extends the SiPush and Bi- 
push byte codes and made available to be read 
by the native CPU. 

18. A system comprising: 

a central processing unit with associated regis- 
ter file; and 

a hardware accelerator operably connected to 
the central processing unit, the hardware accel- 
erator adapted to convert stack-based instruc- 
tions into register-based instructions native to 
the central processing unit, where the hardware 
accelerator increments the Java PC within the 
hardware accelerator by generating an incre- 
ment value based on the number of byte codes 
being disposed, wherein the Java PC is incre- 
mented in the correct manner if multiple byte- 
codes are disposed at the same time 

19. The system of Claims 1-18, wherein the stack- 
based instructions are Java bytecodes. 

20. The system of Claims 1-18, wherein the hardware 
accelerator is not flushed upon an interrupt. 

21. The system of Claims 1-18, wherein the hardware 
accelerator includes a native PC monitor which 
monitors the value of the native PC. 



50 22. The system of Claim 21, wherein the native PC 
monitor enables the hardware accelerator when the 
native program counter is within a hardware accel- 
erator program counter range. 

55 23. The system of Claim 22, wherein an interrupt caus- 
es the native PC to leave the hardware accelerator 
program counter range, causing the hardware ac- 
celerator to stall. 
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24. The system of Claim 23, wherein the return from 
interrupt causes the native PC to go back within the 
hardware accelerator program counter range, ena- 
bling the hardware accelerator. 

25. The system of Claim 1 , wherein the reissue buffer 
provides stored converted instructions when the 
system returns from an interrupt. 

26. The system of Claims 1-18, wherein at least por- 
tions of the hardware accelerator are part of the 
CPU. 



36. The system of claim 1 , wherein the indication of the 
order of the instructions is the native program coun- 
ter value. 

5 37. The system of claim 6, wherein if the stack depth is 
above the maximum depth an overflow flag is gen- 
erated. 

38. The system of claim 6, wherein if the stack depth is 
10 below the minimum depth an underflow flag is gen- 
erated. 



27. The system of Claims 2 and 3, wherein the microc- 
ode stage includes a microcode address logic por- ^ 
tion and a microcode memory portion. 

28. The system of Claim 27, wherein the microcode ad- 
dress logic includes logic to step through addresses 

so that multiple native instructions can be produced 20 
from fewer stack-based instructions. 



29. The system of Claims 2-1 8, further including a reis- 
sue buffer, the reissue buffer adapted to store con- 
verted native instructions issued to the CPU along 25 
with associated native program counter values, the 
system being such that when the CPU returns from 
interrupt, the reissue buffer examines the program 
counter value to determine whether to reissue a 
stored native instruction value. 30 



30. The system of Claims 2 and 3, wherein the microc- 
ode includes fields for native instruction portion and 
fields for additional control bits. 

31. The system of Claim 30, wherein the control bits 
control the interpretation of fields for the native in- 
struction. 



32. The system of Claims 2 and 3, further comprising a 40 
decoding unit, the decoding unit being a part of the 
microcode generating unit, the decoding unit pro- 
ducing additional control signals which are provided 
to the native instruction composer unit to produce 
the native instructions. 



33. The system of Claims 1-18, further comprising a 
stack manager unit used to control which elements 
in the stack are stored within the register file and to 
produce data which is used to compose the native so 
instructions. 



34. The system of claim 11 , wherein the hardware ac- 
celerator copies the variables marked as modified 
> to the system memory for some bytecodes. 



35. 



The system of Claim 34, wherein at least portions 
of the hardware accelerator are part of the CPU. 
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